Lexical data augmentation for sentiment analysis
نویسندگان
چکیده
Machine learning methods, especially deep models, have achieved impressive performance in various natural language processing tasks including sentiment analysis. However, models are more demanding for training data. Data augmentation techniques widely used to generate new instances based on modifications existing data or relying external knowledge bases address annotated scarcity, which hinders the full potential of machine techniques. This paper presents our work using part-of-speech (POS) focused lexical substitution (PLSDA) enhance algorithms We exploit POS information identify words be replaced and investigate different strategies find semantically related substitutions when generating instances. The choice tags as well a variety such semantic-based methods sampling discussed detail. Performance evaluation focuses comparison between PLSDA two previous substitution-based one is thesaurus-based, other lexicon manipulation based. Our approach tested five English analysis benchmarks: SST-2, MR, IMDB, Twitter, AirRecord. Hyperparameters candidate similarity threshold number newly generated optimized. Results show that six classifiers (SVM, LSTM, BiLSTM-AT, bidirectional encoder representations from transformers [BERT], XLNet, RoBERTa) trained with achieve accuracy improvement than 0.6% comparing averaged benchmarks. Introducing constraint well-designed can improve reliability methods. Consequently, significantly improves algorithms.
منابع مشابه
Is there a language of sentiment? An analysis of lexical resources for sentiment analysis
In recent years, sentiment analysis (SA) has emerged as a rapidly expanding field of application and research in the area of information retrieval. In order to facilitate the task of selecting lexical resources for automated SA systems, this paper sets out a detailed analysis of four widely used sentiment lexica. The analysis provides an overview of the coverage of each lexicon individually, th...
متن کاملSentiment Analysis of Social Networking Data Using Categorized Dictionary
Sentiment analysis is the process of analyzing a person’s perception or belief about a particular subject matter. However, finding correct opinion or interest from multi-facet sentiment data is a tedious task. In this paper, a method to improve the sentiment accuracy by utilizing the concept of categorized dictionary for sentiment classification and analysis is proposed. A categorized dictiona...
متن کاملGermanPolarityClues: A Lexical Resource for German Sentiment Analysis
In this paper, we propose GermanPolarityClues, a new publicly available lexical resource for sentiment analysis for the German language. While sentiment analysis and polarity classification has been extensively studied at different document levels (e.g. sentences and phrases), only a few approaches explored the effect of a polarity-based feature selection and subjectivity resources for the Germ...
متن کاملPYTHIA: Employing Lexical and Semantic Features for Sentiment Analysis
Sentiment analysis methods aim at identifying the polarity of a piece of text, e.g., passage, review, snippet, by analyzing lexical features at the level of the terms or the sentences. However, many of the previous works do not utilize features that can offer a deeper understanding of the text, e.g., negation phrases. In this work we demonstrate a novel piece of software, namely PYTHIA, which c...
متن کاملSentiment Analysis and Lexical Cohesion for the Story Cloze Task
We present two NLP components for the Story Cloze Task – dictionary-based sentiment analysis and lexical cohesion. While previous research found no contribution from sentiment analysis to the accuracy on this task, we demonstrate that sentiment is an important aspect. We describe a new approach, using a rule that estimates sentiment congruence in a story. Our sentiment-based system achieves str...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
ژورنال
عنوان ژورنال: Journal of the Association for Information Science and Technology
سال: 2021
ISSN: ['1532-2882', '1532-2890']
DOI: https://doi.org/10.1002/asi.24493